REACH OF COVID 19 IN INDIA¶

Problem Statement¶

Visualizing the consequences of the lethal pandemic of this era which forced the social animal human being to dissocialize amidst the society, with the help of available datasets.

Objective¶

To analyze as well as explorate the reach and behavioral variations of corona virus popularly known as covid-19 on human lives, especially in India.

Importing Essential Packages¶

In [26]:
import pandas as pd
import numpy as np
In [27]:
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
In [28]:
import datetime
import plotly.express as px
from sklearn.model_selection import train_test_split 
from sklearn import metrics

Dataset Preprocessing and cleaning¶

In [29]:
dataset_1=pd.read_csv("C:\\Users\\user\\OneDrive\\Desktop\\4th semester\\EDA\\Project\\covid_19_india (1).csv")
dataset2=pd.read_csv("C:\\Users\\user\\OneDrive\\Desktop\\4th semester\\EDA\\Project\\covid_vaccine_statewise.csv")
In [30]:
dataset_1.head() 
Out[30]:
Sno Date Time State/UnionTerritory ConfirmedIndianNational ConfirmedForeignNational Cured Deaths Confirmed
0 1 2020-01-30 6:00 PM Kerala 1 0 0 0 1
1 2 2020-01-31 6:00 PM Kerala 1 0 0 0 1
2 3 2020-02-01 6:00 PM Kerala 2 0 0 0 2
3 4 2020-02-02 6:00 PM Kerala 3 0 0 0 3
4 5 2020-02-03 6:00 PM Kerala 3 0 0 0 3
In [31]:
dataset_1.tail() 
Out[31]:
Sno Date Time State/UnionTerritory ConfirmedIndianNational ConfirmedForeignNational Cured Deaths Confirmed
18105 18106 2021-08-11 8:00 AM Telangana - - 638410 3831 650353
18106 18107 2021-08-11 8:00 AM Tripura - - 77811 773 80660
18107 18108 2021-08-11 8:00 AM Uttarakhand - - 334650 7368 342462
18108 18109 2021-08-11 8:00 AM Uttar Pradesh - - 1685492 22775 1708812
18109 18110 2021-08-11 8:00 AM West Bengal - - 1506532 18252 1534999
In [32]:
dataset_1.dtypes
Out[32]:
Sno                          int64
Date                        object
Time                        object
State/UnionTerritory        object
ConfirmedIndianNational     object
ConfirmedForeignNational    object
Cured                        int64
Deaths                       int64
Confirmed                    int64
dtype: object
In [33]:
dataset_1.shape
Out[33]:
(18110, 9)
In [34]:
dataset_1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18110 entries, 0 to 18109
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Sno                       18110 non-null  int64 
 1   Date                      18110 non-null  object
 2   Time                      18110 non-null  object
 3   State/UnionTerritory      18110 non-null  object
 4   ConfirmedIndianNational   18110 non-null  object
 5   ConfirmedForeignNational  18110 non-null  object
 6   Cured                     18110 non-null  int64 
 7   Deaths                    18110 non-null  int64 
 8   Confirmed                 18110 non-null  int64 
dtypes: int64(4), object(5)
memory usage: 1.2+ MB
In [35]:
dataset_1.isnull()
Out[35]:
Sno Date Time State/UnionTerritory ConfirmedIndianNational ConfirmedForeignNational Cured Deaths Confirmed
0 False False False False False False False False False
1 False False False False False False False False False
2 False False False False False False False False False
3 False False False False False False False False False
4 False False False False False False False False False
... ... ... ... ... ... ... ... ... ...
18105 False False False False False False False False False
18106 False False False False False False False False False
18107 False False False False False False False False False
18108 False False False False False False False False False
18109 False False False False False False False False False

18110 rows × 9 columns

In [36]:
dataset_1.isnull().sum()
Out[36]:
Sno                         0
Date                        0
Time                        0
State/UnionTerritory        0
ConfirmedIndianNational     0
ConfirmedForeignNational    0
Cured                       0
Deaths                      0
Confirmed                   0
dtype: int64
In [37]:
dataset_1.describe()
Out[37]:
Sno Cured Deaths Confirmed
count 18110.000000 1.811000e+04 18110.000000 1.811000e+04
mean 9055.500000 2.786375e+05 4052.402264 3.010314e+05
std 5228.051023 6.148909e+05 10919.076411 6.561489e+05
min 1.000000 0.000000e+00 0.000000 0.000000e+00
25% 4528.250000 3.360250e+03 32.000000 4.376750e+03
50% 9055.500000 3.336400e+04 588.000000 3.977350e+04
75% 13582.750000 2.788698e+05 3643.750000 3.001498e+05
max 18110.000000 6.159676e+06 134201.000000 6.363442e+06
In [38]:
dataset_1. rename(columns = {'State/UnionTerritory':'State'}, inplace = True)
In [39]:
dataset_1["State"].unique()
Out[39]:
array(['Kerala', 'Telengana', 'Delhi', 'Rajasthan', 'Uttar Pradesh',
       'Haryana', 'Ladakh', 'Tamil Nadu', 'Karnataka', 'Maharashtra',
       'Punjab', 'Jammu and Kashmir', 'Andhra Pradesh', 'Uttarakhand',
       'Odisha', 'Puducherry', 'West Bengal', 'Chhattisgarh',
       'Chandigarh', 'Gujarat', 'Himachal Pradesh', 'Madhya Pradesh',
       'Bihar', 'Manipur', 'Mizoram', 'Andaman and Nicobar Islands',
       'Goa', 'Unassigned', 'Assam', 'Jharkhand', 'Arunachal Pradesh',
       'Tripura', 'Nagaland', 'Meghalaya',
       'Dadra and Nagar Haveli and Daman and Diu',
       'Cases being reassigned to states', 'Sikkim', 'Daman & Diu',
       'Lakshadweep', 'Telangana', 'Dadra and Nagar Haveli', 'Bihar****',
       'Madhya Pradesh***', 'Himanchal Pradesh', 'Karanataka',
       'Maharashtra***'], dtype=object)

Dropping column which have not any value in state column¶

In [40]:
dataset_1.drop(dataset_1[dataset_1['State']=="Unassigned"].index, inplace = True)
dataset_1.drop(dataset_1[dataset_1['State']=="Cases being reassigned to states"].index, inplace = True)

Renaming to correct name of all states¶

In [41]:
dataset_1.loc[dataset_1["State"]=="Karanataka", "State"]="Karnataka"
dataset_1.loc[dataset_1["State"]=="Bihar****", "State"]="Bihar"
dataset_1.loc[dataset_1["State"]=="Maharashtra***", "State"]="Maharashtra"
dataset_1.loc[dataset_1["State"]=="Andaman and Nicobar Islands", "State"]="Andaman & Nicobar Island"
dataset_1.loc[dataset_1["State"]=="Dadra and Nagar Haveli", "State"]="Dadara & Nagar Havelli"
dataset_1.loc[dataset_1["State"]=="Dadra and Nagar Haveli and Daman and Diu", "State"]="Dadara & Nagar Havelli"
dataset_1.loc[dataset_1["State"]=="Madhya Pradesh***", "State"]="Madhya Pradesh"
dataset_1.loc[dataset_1["State"]=="Himanchal Pradesh", "State"]="Himachal Pradesh"
dataset_1.loc[dataset_1["State"]=="Telengana", "State"]="Telangana"
dataset_1.loc[dataset_1["State"]=="Jammu and Kashmir", "State"]="Jammu & Kashmir"
dataset_1.loc[dataset_1["State"]=="Ladakh", "State"]="Jammu & Kashmir"
dataset_1.loc[dataset_1["State"]=="Delhi", "State"]="NCT of Delhi"
dataset_1.loc[dataset_1["State"]=="Arunachal Pradesh", "State"]="Arunanchal Pradesh"

In this dataset we found that some of the names of the state is not correct.So, it was corrected using this code.

Dropping the unuseful columns¶

In [42]:
dataset_1.drop( columns = ['Sno', 'ConfirmedIndianNational', 'ConfirmedForeignNational','Time'],axis=0, inplace  = True )
In [43]:
dataset_1.head(10)
Out[43]:
Date State Cured Deaths Confirmed
0 2020-01-30 Kerala 0 0 1
1 2020-01-31 Kerala 0 0 1
2 2020-02-01 Kerala 0 0 2
3 2020-02-02 Kerala 0 0 3
4 2020-02-03 Kerala 0 0 3
5 2020-02-04 Kerala 0 0 3
6 2020-02-05 Kerala 0 0 3
7 2020-02-06 Kerala 0 0 3
8 2020-02-07 Kerala 0 0 3
9 2020-02-08 Kerala 0 0 3
In [44]:
dataset_1['Date'] = pd.to_datetime(dataset_1['Date'])
In [45]:
print("Number of days of the data sample:",dataset_1['Date'].max()-dataset_1['Date'].min())
Number of days of the data sample: 559 days 00:00:00

Effect of Covid Statewise¶

In [46]:
statewise=dataset_1.groupby("State")[["Confirmed","Cured","Deaths"]].sum().reset_index()
In [47]:
statewise
Out[47]:
State Confirmed Cured Deaths
0 Andaman & Nicobar Island 1938498 1848286 27136
1 Andhra Pradesh 392432753 370426530 2939367
2 Arunanchal Pradesh 7176907 6588149 26799
3 Assam 99837011 92678680 638323
4 Bihar 133662075 126525370 1112347
5 Chandigarh 10858627 10117035 147694
6 Chhattisgarh 163776262 151609364 2063920
7 Dadara & Nagar Havelli 1959354 1862102 1022
8 Daman & Diu 2 0 0
9 Goa 28240159 26027201 447801
10 Gujarat 143420082 132487127 2219448
11 Haryana 134347285 126585342 1502799
12 Himachal Pradesh 30237805 27701150 494855
13 Jammu & Kashmir 62172019 57056301 885498
14 Jharkhand 62111994 58034506 748641
15 Karnataka 488855931 444665851 6089959
16 Kerala 458906023 420174235 1888177
17 Lakshadweep 915784 820925 3908
18 Madhya Pradesh 136416921 127505732 1788258
19 Maharashtra 1127721063 1024765950 23868185
20 Manipur 12617943 11230568 173056
21 Meghalaya 7355969 6537909 101950
22 Mizoram 2984732 2384602 9791
23 NCT of Delhi 287227765 273419887 4943294
24 Nagaland 5041742 4519526 58460
25 Odisha 160130533 150923455 790814
26 Puducherry 20065891 18483117 312155
27 Punjab 99949702 91458159 2785594
28 Rajasthan 162369656 150356820 1473089
29 Sikkim 3186799 2747214 53150
30 Tamil Nadu 431928644 404095807 5916658
31 Telangana 130562647 122154512 750075
32 Tripura 14050250 12976846 150342
33 Uttar Pradesh 312625843 291479351 4143450
34 Uttarakhand 53140414 48362741 986001
35 West Bengal 263107876 247515102 3846989

We have added more columns like RECOVERY RATE,MORTALITY RATE,ACTIVE CASES to know more accuratelt the condition of the states.

In [48]:
statewise["Recovery Rate"] = statewise["Cured"]*100 / statewise["Confirmed"]
statewise["Mortality Rate"] = statewise["Deaths"]*100 / statewise["Confirmed"]
statewise["Active Cases"]= statewise["Confirmed"]-(statewise["Cured"]+statewise["Deaths"])
In [49]:
statewise.style.background_gradient(cmap='RdBu_r')
Out[49]:
  State Confirmed Cured Deaths Recovery Rate Mortality Rate Active Cases
0 Andaman & Nicobar Island 1938498 1848286 27136 95.346294 1.399847 63076
1 Andhra Pradesh 392432753 370426530 2939367 94.392358 0.749012 19066856
2 Arunanchal Pradesh 7176907 6588149 26799 91.796494 0.373406 561959
3 Assam 99837011 92678680 638323 92.829983 0.639365 6520008
4 Bihar 133662075 126525370 1112347 94.660636 0.832208 6024358
5 Chandigarh 10858627 10117035 147694 93.170481 1.360154 593898
6 Chhattisgarh 163776262 151609364 2063920 92.571025 1.260207 10102978
7 Dadara & Nagar Havelli 1959354 1862102 1022 95.036527 0.052160 96230
8 Daman & Diu 2 0 0 0.000000 0.000000 2
9 Goa 28240159 26027201 447801 92.163791 1.585689 1765157
10 Gujarat 143420082 132487127 2219448 92.376971 1.547516 8713507
11 Haryana 134347285 126585342 1502799 94.222479 1.118593 6259144
12 Himachal Pradesh 30237805 27701150 494855 91.610982 1.636544 2041800
13 Jammu & Kashmir 62172019 57056301 885498 91.771671 1.424271 4230220
14 Jharkhand 62111994 58034506 748641 93.435265 1.205308 3328847
15 Karnataka 488855931 444665851 6089959 90.960511 1.245757 38100121
16 Kerala 458906023 420174235 1888177 91.559974 0.411452 36843611
17 Lakshadweep 915784 820925 3908 89.641771 0.426738 90951
18 Madhya Pradesh 136416921 127505732 1788258 93.467681 1.310877 7122931
19 Maharashtra 1127721063 1024765950 23868185 90.870516 2.116497 79086928
20 Manipur 12617943 11230568 173056 89.004745 1.371507 1214319
21 Meghalaya 7355969 6537909 101950 88.878963 1.385949 716110
22 Mizoram 2984732 2384602 9791 79.893337 0.328036 590339
23 NCT of Delhi 287227765 273419887 4943294 95.192708 1.721036 8864584
24 Nagaland 5041742 4519526 58460 89.642151 1.159520 463756
25 Odisha 160130533 150923455 790814 94.250267 0.493856 8416264
26 Puducherry 20065891 18483117 312155 92.112117 1.555650 1270619
27 Punjab 99949702 91458159 2785594 91.504184 2.786996 5705949
28 Rajasthan 162369656 150356820 1473089 92.601551 0.907244 10539747
29 Sikkim 3186799 2747214 53150 86.206064 1.667818 386435
30 Tamil Nadu 431928644 404095807 5916658 93.556149 1.369823 21916179
31 Telangana 130562647 122154512 750075 93.560076 0.574494 7658060
32 Tripura 14050250 12976846 150342 92.360250 1.070031 923062
33 Uttar Pradesh 312625843 291479351 4143450 93.235846 1.325370 17003042
34 Uttarakhand 53140414 48362741 986001 91.009342 1.855464 3791672
35 West Bengal 263107876 247515102 3846989 94.073619 1.462134 11745785

Visualization by different types of graph¶

In [50]:
statewise1=statewise.copy(deep=True)
fig = px.pie(statewise1, values='Confirmed', names='State',width=800,height=500)
fig.update_layout(title="Confirmed cases in various states",)
From the above pie chart it is clearly intrepreted that the maximum number of confirmed cases was in Maharastra i.e. 20.7% and numbers are gradually decreasing while going with the other states and the minimum we got from the Andaman and Nicobar i.e. less than 1%.¶
In [51]:
statewise1=statewise.copy(deep=True)
#statewise1.loc[statewise1['Cured']< 30000000, 'State'] = 'Other states' 
fig = px.pie(statewise1, values='Cured', names='State',width=800,height=500)
fig.update_layout( title="Cured cases in various states",)
In [52]:
statewise1=statewise.copy(deep=True)
fig = px.pie(statewise1, values='Mortality Rate', names='State',width=800,height=500)
fig.update_layout(title="Mortality Rate cases in various states",)
From the above pie chart it is clearly seen that the mortality rate was higher in Punjab i.e. 2.78 and it was followed by Maharastra ,Uttarakhand,Delhi and more and the minimum moratality rate was in Daman and Diu i.e. less than 1 or exact we can say(0.057).¶
In [53]:
statewise1=statewise.copy(deep=True) 
fig = px.pie(statewise1, values='Recovery Rate', names='State',width=800,height=500)
fig.update_layout(title="Recovery Rate cases in various states",)
From the above chart it clearly intrepretet that the highest recovery rate was in Andaman & NIcobar with 95.34% and futher follwed by Delhi(95.18%) and later on it and the lowest recovery rate was found in North eastern state and Maharastra ,Karnataka.¶

COVID VISUALIZATION WITH TIME PERIOD¶

In [54]:
cases=dataset_1.groupby("Date")[["Cured","Deaths","Confirmed"]].sum().reset_index()
In [55]:
fig=px.bar(cases,x='Date',y=cases.columns[3],)
fig.update_layout(
    title="Total Confirmed cases vs Time",
    xaxis_title="Time Period",
    yaxis_title="Cases",
    legend_title="Cases",
    font=dict(size=14)
)
fig.layout.template = 'presentation'
fig.show()
On the above chart it we have taken time period on x-axis and number of cases on y-axis and the graph clealy shows that the number of cases was not increasing between march to july but after that the number of cases it increases suddenly on higher level and it increases drastically after March 2021 becaus the delta virus phase was there .¶
In [56]:
fig=px.line(cases,x='Date',y=cases.columns[1:3],)
fig.update_layout(
    title="Total Cured cases &deaths vs Time",
    xaxis_title="Time Period",
    yaxis_title="Cases",
    legend_title="Cases",
    font=dict(size=14)
)
fig.layout.template = 'presentation'
fig.show()
On the above chart it we have taken time period on x-axis and number of death cases on y-axis and the graph clealy shows that the number of death was less between march to july but after that the number of drastically it increases suddenly on higher level and it increases drastically after March 2021 becaus the delta virus phase was there .¶

COVID VISULIZATION WITH RESPECT TO MONTHS & YEAR¶

In [57]:
dataset_1['Date']= pd.to_datetime(dataset_1['Date'])          
data_of_20 = dataset_1.loc[dataset_1.Date.dt.year==2020]           
data_of_21 = dataset_1.loc[dataset_1.Date.dt.year==2021] 
In [58]:
data_of_20['Month']=data_of_20['Date'].dt.month                   
data_of_21['Month']=data_of_21['Date'].dt.month  
data20= data_of_20.groupby('Month')[['Confirmed','Deaths','Cured']].sum() 
data21= data_of_21.groupby('Month')[['Confirmed','Deaths','Cured']].sum() 
data20.index=pd.to_datetime( data20.index , format = '%m').strftime( '%B' )
data21.index=pd.to_datetime( data21.index , format = '%m').strftime( '%B' )
C:\Users\user\AppData\Local\Temp\ipykernel_9616\1112686033.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Users\user\AppData\Local\Temp\ipykernel_9616\1112686033.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [59]:
fig=px.bar(data20,x=data20.index,y=data20.columns[0],)
fig.update_layout(
    title="Total Confirmed over the months in 2020",
    xaxis_title="Months",
    yaxis_title="Cases/deaths",
    legend_title="Types",
    font=dict(size=14)
)
#fig.update_traces(mode='markers+lines')
fig.layout.template = 'presentation'
fig.show()

On the above graph it is clearly seen that in 2020 the the number of confirmed cases gradually inceases month by month and and drastically increased from August to December.¶

In [60]:
fig=px.line(data20,x=data20.index,y=data20.columns[1:3],)
fig.update_layout(
    title="Total Cured Cases vs Total Deaths over the months in 2020",
    xaxis_title="Months",
    yaxis_title="Cases/deaths",
    legend_title="Types",
    font=dict(size=14)
)
fig.update_traces(mode='markers+lines')
fig.layout.template = 'presentation'
fig.show()

From the above line graph it is clearly seen that the cured rate is increasing and the number of death is less thorughout the year as it was Covid first wave.¶

In [61]:
fig=px.line(data21,x=data21.index,y=data21.columns[0:3],)
fig.update_layout(
    title="Total Cured Cases -Total Deaths- Total Confirmed in 2021",
    xaxis_title="Months",
    yaxis_title="Number",
    legend_title="Types",
    font=dict(size=14)
)
fig.update_traces(mode='markers+lines')
fig.layout.template = 'presentation'
fig.show()

From the above line graph it is observed thaat in the year 2021 the confirmed cases increased rapidly and it touched approx total of 1 billion cases in and it gradually fall after August as in this year Covid second wave hit India.¶

In [62]:
fig=px.line(dataset_1[dataset_1['State'].isin(['Bihar', 'West Bengal', 'Jharkhand'])],x='Date',y='Deaths',color='State')
fig.update_layout(
    title="Trend of Deaths Cases in states",
    xaxis_title="Time Period",
    yaxis_title="Cases",
    legend_title="State",
    font=dict(size=14)
)
fig.layout.template = 'presentation'
fig.show()
In [63]:
fig=px.line(dataset_1[dataset_1['State'].isin(['Maharashtra', 'Karnataka', 'Kerala', 'Tamil Nadu', 'Andhra Pradesh'])],x='Date',y='Deaths',color='State')
fig.update_layout(
    title="Trend of Deaths Cases in states",
    xaxis_title="Time Period",
    yaxis_title="Cases",
    legend_title="State",
    font=dict(size=14)
)
fig.layout.template = 'presentation'
fig.show()
In [64]:
corrMatrix=dataset_1.corr()
print(corrMatrix)
sns.heatmap(corrMatrix, annot = True, cmap= 'coolwarm')
              Cured    Deaths  Confirmed
Cured      1.000000  0.917492   0.997749
Deaths     0.917492  1.000000   0.918308
Confirmed  0.997749  0.918308   1.000000
Out[64]:
<AxesSubplot:>
In [65]:
plt.figure(figsize=(15,15))
sns.heatmap(dataset_1.corr(), color='b', annot=True)
Out[65]:
<AxesSubplot:>
In [ ]:
 
In [66]:
dataset_1.plot(kind = 'scatter',x= 'Confirmed', y='Cured', alpha= 0.45,
        s=dataset_1['Deaths']/10000,c= 'Confirmed', cmap = 'jet',
        label='Scatter Plot',title ='Graphical Geographical Data',figsize= (15,10));
In [67]:
X = np.arange(60)
X = X.reshape(-1,1)
In [68]:
y = dataset_1.iloc[:,-1].values.astype(float)
y
Out[68]:
array([1.000000e+00, 1.000000e+00, 2.000000e+00, ..., 3.424620e+05,
       1.708812e+06, 1.534999e+06])
In [69]:
y = np.diff(y)
y = y.reshape(-1,1)
y
Out[69]:
array([[ 0.00000e+00],
       [ 1.00000e+00],
       [ 1.00000e+00],
       ...,
       [ 2.61802e+05],
       [ 1.36635e+06],
       [-1.73813e+05]])
In [70]:
x=dataset_1['Confirmed']
y=dataset_1['Cured']
plt.plot(x, y, 'o', color='green')
m, b = np.polyfit(x, y, 1)
plt.plot(x, m*x+b, color='blue')
Out[70]:
[<matplotlib.lines.Line2D at 0x18ae2e8d2d0>]
In [71]:
x=dataset_1['Confirmed']
y=dataset_1['Deaths']
plt.plot(x, y, 'o', color='blue')
m, b = np.polyfit(x, y, 1)
plt.plot(x, m*x+b, color='red')
Out[71]:
[<matplotlib.lines.Line2D at 0x18ae43a4910>]
In [72]:
x=dataset_1['Deaths']
y=dataset_1['Cured']
plt.plot(x, y, 'o', color='yellow')
m, b = np.polyfit(x, y, 1)
#use red as color for regression line
plt.plot(x, m*x+b, color='black')
Out[72]:
[<matplotlib.lines.Line2D at 0x18ae441c250>]
In [73]:
from scipy import stats
z= np.abs(stats.zscore(dataset_1['Confirmed']))
print(z)
df1=dataset_1['Confirmed']
0        0.459730
1        0.459730
2        0.459729
3        0.459727
4        0.459727
           ...   
18105    0.530088
18106    0.336969
18107    0.061486
18108    2.141033
18109    1.876494
Name: Confirmed, Length: 18047, dtype: float64
In [74]:
df_outlier=df1[(z<3)]
In [75]:
df_outlier
Out[75]:
0              1
1              1
2              2
3              3
4              3
          ...   
18105     650353
18106      80660
18107     342462
18108    1708812
18109    1534999
Name: Confirmed, Length: 17665, dtype: int64
In [76]:
q1=dataset_1.quantile(0.25)
q2=dataset_1.quantile(0.75)
q3=q2-q1
q3
Out[76]:
Cured        277287.0
Deaths         3635.5
Confirmed    296893.5
dtype: float64
In [77]:
from scipy import stats
z= np.abs(stats.zscore(dataset_1['Deaths']))
print(z)
df2=dataset_1['Deaths']
0        0.371877
1        0.371877
2        0.371877
3        0.371877
4        0.371877
           ...   
18105    0.021540
18106    0.301188
18107    0.301911
18108    1.710849
18109    1.297230
Name: Deaths, Length: 18047, dtype: float64
In [78]:
df_outlier2=df2[(z<3)]
df_outlier2
Out[78]:
0            0
1            0
2            0
3            0
4            0
         ...  
18105     3831
18106      773
18107     7368
18108    22775
18109    18252
Name: Deaths, Length: 17733, dtype: int64

District Wise Visualization¶

In [79]:
import seaborn as sns;sns.set(style='whitegrid')
%matplotlib inline
from numpy.linalg import pinv,inv
import matplotlib.image as mpimg
import gc
from pandas.plotting import scatter_matrix
import plotly.express as px
In [80]:
df_district = pd.read_csv('C:\\Users\\user\\OneDrive\\Desktop\\4th semester\\EDA\\Project\\district_level_latest.csv')
df_district.head()
Out[80]:
state state code district confirmed active deaths recovered delta_confirmed delta_deceased delta_recovered notes
0 Andaman and Nicobar Islands AN Nicobars 0 0 0 0 0 0 0 NaN
1 Andaman and Nicobar Islands AN North and Middle Andaman 1 0 0 1 0 0 0 NaN
2 Andaman and Nicobar Islands AN South Andaman 32 0 0 32 0 0 0 NaN
3 Andhra Pradesh AP Anantapur 122 62 4 56 4 0 4 NaN
4 Andhra Pradesh AP Chittoor 165 88 0 77 14 4 0 NaN
In [81]:
df_district.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 753 entries, 0 to 752
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   state            753 non-null    object
 1   state code       753 non-null    object
 2   district         753 non-null    object
 3   confirmed        753 non-null    int64 
 4   active           753 non-null    int64 
 5   deaths           753 non-null    int64 
 6   recovered        753 non-null    int64 
 7   delta_confirmed  753 non-null    int64 
 8   delta_deceased   753 non-null    int64 
 9   delta_recovered  753 non-null    int64 
 10  notes            22 non-null     object
dtypes: int64(7), object(4)
memory usage: 64.8+ KB
In [82]:
df_district.describe()
Out[82]:
confirmed active deaths recovered delta_confirmed delta_deceased delta_recovered
count 753.000000 753.000000 753.000000 753.000000 753.000000 753.000000 753.000000
mean 109.177955 68.407703 3.517928 37.244356 0.217795 0.022576 0.100930
std 773.663211 573.060737 30.265097 197.436770 2.198973 0.350941 1.160251
min 0.000000 -372.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 10.000000 3.000000 0.000000 2.000000 0.000000 0.000000 0.000000
75% 41.000000 19.000000 1.000000 18.000000 0.000000 0.000000 0.000000
max 16738.000000 13173.000000 621.000000 3045.000000 45.000000 8.000000 27.000000
In [83]:
df_district.duplicated().sum()
Out[83]:
0
In [84]:
df_district.dtypes
Out[84]:
state              object
state code         object
district           object
confirmed           int64
active              int64
deaths              int64
recovered           int64
delta_confirmed     int64
delta_deceased      int64
delta_recovered     int64
notes              object
dtype: object
In [85]:
df_district.isnull().sum()
Out[85]:
state                0
state code           0
district             0
confirmed            0
active               0
deaths               0
recovered            0
delta_confirmed      0
delta_deceased       0
delta_recovered      0
notes              731
dtype: int64
In [86]:
grouped_df_district = df_district[["state","district","confirmed","active","recovered","deaths"]]
grouped_df_district
Out[86]:
state district confirmed active recovered deaths
0 Andaman and Nicobar Islands Nicobars 0 0 0 0
1 Andaman and Nicobar Islands North and Middle Andaman 1 0 1 0
2 Andaman and Nicobar Islands South Andaman 32 0 32 0
3 Andhra Pradesh Anantapur 122 62 56 4
4 Andhra Pradesh Chittoor 165 88 77 0
... ... ... ... ... ... ...
748 West Bengal Purba Bardhaman 10 7 3 0
749 West Bengal Purba Medinipur 49 22 26 1
750 West Bengal Purulia 0 0 0 0
751 West Bengal South 24 Parganas 79 50 27 2
752 West Bengal Uttar Dinajpur 4 4 0 0

753 rows × 6 columns

In [87]:
grouped_df_district1 = df_district[["district","delta_confirmed","delta_deceased","delta_recovered"]]
grouped_df_district1
Out[87]:
district delta_confirmed delta_deceased delta_recovered
0 Nicobars 0 0 0
1 North and Middle Andaman 0 0 0
2 South Andaman 0 0 0
3 Anantapur 4 0 4
4 Chittoor 14 4 0
... ... ... ... ...
748 Purba Bardhaman 0 0 0
749 Purba Medinipur 0 0 0
750 Purulia 0 0 0
751 South 24 Parganas 0 0 0
752 Uttar Dinajpur 0 0 0

753 rows × 4 columns

In [88]:
grouped_df_district = grouped_df_district.sort_values(by="confirmed",ascending=False)
grouped_df_district = grouped_df_district.reset_index(drop=True)
grouped_df_district
Out[88]:
state district confirmed active recovered deaths
0 Maharashtra Mumbai 16738 13173 2944 621
1 Delhi Delhi_ 7682 4523 3045 114
2 Gujarat Ahmedabad 6910 4198 2247 465
3 Tamil Nadu Chennai 5637 4834 758 45
4 Maharashtra Pune 3314 1762 1377 175
... ... ... ... ... ... ...
748 Manipur Pherzawl 0 0 0 0
749 Manipur Noney 0 0 0 0
750 Manipur Kangpokpi 0 0 0 0
751 Manipur Kamjong 0 0 0 0
752 Manipur Jiribam 0 0 0 0

753 rows × 6 columns

In [89]:
sns.set(rc={'figure.figsize':(15,8)})
sns.barplot(x='district', y='active', data=grouped_df_district.nlargest(10,'active'))
plt.show()
In [90]:
sns.set(rc={'figure.figsize':(15,8)})
sns.barplot(x='district', y='recovered', data=grouped_df_district.nlargest(10,'recovered'))
plt.show()
In [91]:
sns.set(rc={'figure.figsize':(15,8)})
sns.barplot(x='district', y='deaths', data=grouped_df_district.nlargest(10,'deaths'))
plt.show()
In [92]:
sns.set(rc={'figure.figsize':(15,8)})
sns.barplot(x='district', y='delta_confirmed', data=grouped_df_district1.nlargest(10,'delta_confirmed'))
plt.show()
In [93]:
sns.set(rc={'figure.figsize':(15,8)})
sns.barplot(x='district', y='delta_deceased', data=grouped_df_district1.nlargest(10,'delta_deceased'))
plt.show()
In [94]:
sns.set(rc={'figure.figsize':(15,8)})
sns.barplot(x='district', y='delta_recovered', data=grouped_df_district1.nlargest(10,'delta_recovered'))
plt.show()
In [95]:
data = df_district[df_district.sum(axis = 1) > 0]
data = data.groupby(['state'])['deaths'].sum().reset_index()
data_death = data[data['deaths'] > 0]
state_fig = px.bar(data_death, x='state', y='deaths', title='State wise deaths reported of COVID-19 in India', text='deaths')
state_fig.show()
C:\Users\user\AppData\Local\Temp\ipykernel_9616\3214162351.py:1: FutureWarning:

Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.

In [96]:
data = df_district[df_district.sum(axis = 1) > 0]
data = data.groupby(['state'])['recovered'].sum().reset_index()
data_death = data[data['recovered'] > 0]
state_fig = px.bar(data_death, x='state', y='recovered', title='State wise recovered reported of COVID-19 in India', text='recovered')
state_fig.show()
C:\Users\user\AppData\Local\Temp\ipykernel_9616\3184094857.py:1: FutureWarning:

Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.

In [97]:
df_district['active'] = df_district['confirmed'] - df_district['deaths'] - df_district['recovered']
 
r_data = df_district.groupby(["state"])["deaths", "confirmed", "recovered", "active"].sum().reset_index()
r_data = r_data.sort_values(by='deaths', ascending=False)
r_data = r_data[r_data['deaths']>50]
plt.figure(figsize=(15, 5))
plt.plot(r_data['state'], r_data['deaths'],color='red')
plt.plot(r_data['state'], r_data['confirmed'],color='green')
plt.plot(r_data['state'], r_data['recovered'], color='blue')
plt.plot(r_data['state'], r_data['active'], color='black')
 
plt.title('Total Deaths(>150), Confirmed, Recovered and Active Cases by Country')
plt.show()
C:\Users\user\AppData\Local\Temp\ipykernel_9616\1183842964.py:3: FutureWarning:

Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.

In this graph red on shows number of deaths,green represents confirmed cases,blue one shows recovered number of people and last black represents the current active cases at that time.

In [98]:
df_district["deaths"] = pd.cut(df_district["deaths"],bins=[0., 1.5, 3.0, 4.5, 6., np.inf],labels=[1, 2, 3, 4, 5])
In [99]:
dataset = pd.read_csv('C:\\Users\\user\\OneDrive\\Desktop\\4th semester\\EDA\\Project\\district_level_latest.csv')
x = dataset.iloc[3:, :-1].values
y = dataset.iloc[4:, 1].values
In [100]:
x_train = df_district.active
In [101]:
y_train = df_district.recovered
In [102]:
x_train.head()
Out[102]:
0     0
1     0
2     0
3    62
4    88
Name: active, dtype: int64
In [103]:
plt.scatter(x_train, y_train, color = "red")
plt.title("active VS recovered")
plt.xlabel("Active Case")
plt.ylabel("Recovered")
plt.show()
In [104]:
gb = df_district.groupby('state')
gb.first()
Out[104]:
state code district confirmed active deaths recovered delta_confirmed delta_deceased delta_recovered notes
state
Andaman and Nicobar Islands AN Nicobars 0 0 NaN 0 0 0 0 None
Andhra Pradesh AP Anantapur 122 62 3 56 4 0 4 None
Arunachal Pradesh AR Anjaw 0 0 NaN 0 0 0 0 None
Assam AS Baksa 0 0 1 0 0 0 0 Case tranferred from Nagaland
Bihar BR Araria 4 3 1 1 0 0 0 None
Chandigarh CH Chandigarh 191 151 2 37 0 0 0 None
Chhattisgarh CT Balod 1 1 NaN 0 0 0 0 None
Dadra and Nagar Haveli and Daman and Diu DN Dadra and Nagar Haveli 1 0 NaN 1 0 0 0 None
Delhi DL Central Delhi 184 184 1 0 0 0 0 None
Goa GA North Goa 6 0 NaN 6 0 0 0 None
Gujarat GJ Other State 1 1 5 0 0 0 0 None
Haryana HR Ambala 42 2 2 38 0 0 0 Italian tourists who were treated in Haryana
Himachal Pradesh HP Bilaspur 4 4 1 0 0 0 0 Active cases different due to migrated cases
Jammu and Kashmir JK Anantnag 145 121 1 23 0 0 0 None
Jharkhand JH Bokaro 10 0 1 9 0 0 0 None
Karnataka KA Bagalkote 69 41 1 27 0 0 0 One death on 27th Apr is not included as it's ...
Kerala KL Alappuzha 5 0 1 5 0 0 0 Case of Mahe native who expired in Kannur, add...
Ladakh LA Kargil 9 2 NaN 7 0 0 0 None
Lakshadweep LD Lakshadweep 0 0 NaN 0 0 0 0 None
Madhya Pradesh MP Agar Malwa 13 0 1 12 0 0 0 MP bulletin dated 28 Apr reduced total cases i...
Maharashtra MH Ahmednagar 70 32 2 35 0 0 0 Reconciled as per MH bulleting 24/04
Manipur MN Bishnupur 0 0 NaN 0 0 0 0 None
Meghalaya ML East Garo Hills 0 0 1 0 0 0 0 None
Mizoram MZ Aizawl 1 0 NaN 1 0 0 0 None
Nagaland NL Dimapur 0 0 NaN 0 0 0 0 None
Odisha OR Angul 15 15 1 0 0 0 0 Khorda (except Bhubaneswar municipal corporati...
Puducherry PY Karaikal 1 1 NaN 0 0 0 0 None
Punjab PB Amritsar 298 260 3 34 0 0 0 None
Rajasthan RJ Ajmer 242 125 4 112 0 0 2 Evacuees from other countries; They have been ...
Sikkim SK East District 0 0 NaN 0 0 0 0 None
Tamil Nadu TN Airport Quarantine 9 9 2 0 0 0 0 None
Telangana TG Other State 37 37 5 0 0 0 0 None
Tripura TR Dhalai 152 125 NaN 27 0 0 0 None
Uttar Pradesh UP Agra 785 367 5 394 0 0 0 [14th May] <br>\nConfirmed cases for the distr...
Uttarakhand UT Almora 2 1 1 1 0 0 0 None
West Bengal WB Alipurduar 0 0 1 0 0 0 0 None
In [105]:
gbb = df_district.groupby(['state', 'active'])
gbb.first()
Out[105]:
state code district confirmed deaths recovered delta_confirmed delta_deceased delta_recovered notes
state active
Andaman and Nicobar Islands 0 AN Nicobars 0 NaN 0 0 0 0 None
Andhra Pradesh 3 AP Prakasam 63 NaN 60 0 0 0 None
7 AP Vizianagaram 7 NaN 0 3 0 0 None
17 AP East Godavari 52 NaN 35 1 0 0 None
24 AP West Godavari 69 NaN 45 0 0 5 None
... ... ... ... ... ... ... ... ... ... ...
West Bengal 50 WB South 24 Parganas 79 2 27 0 0 0 None
101 WB Hooghly 135 3 30 0 0 0 None
186 WB North 24 Parganas 317 5 102 0 0 0 None
347 WB Howrah 509 5 135 0 0 0 None
625 WB Kolkata 1157 5 386 0 0 0 None

384 rows × 9 columns

In [ ]:
 

COVID VACCINATION DATASET Visualization¶

In [106]:
dataset2=pd.read_csv("C:\\Users\\user\\OneDrive\\Desktop\\4th semester\\EDA\\Project\\covid_vaccine_statewise.csv")
dataset2
Out[106]:
Updated On State Total Doses Administered Sessions Sites First Dose Administered Second Dose Administered Male (Doses Administered) Female (Doses Administered) Transgender (Doses Administered) ... 18-44 Years (Doses Administered) 45-60 Years (Doses Administered) 60+ Years (Doses Administered) 18-44 Years(Individuals Vaccinated) 45-60 Years(Individuals Vaccinated) 60+ Years(Individuals Vaccinated) Male(Individuals Vaccinated) Female(Individuals Vaccinated) Transgender(Individuals Vaccinated) Total Individuals Vaccinated
0 16/01/2021 India 48276.0 3455.0 2957.0 48276.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 23757.0 24517.0 2.0 48276.0
1 17/01/2021 India 58604.0 8532.0 4954.0 58604.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 27348.0 31252.0 4.0 58604.0
2 18/01/2021 India 99449.0 13611.0 6583.0 99449.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 41361.0 58083.0 5.0 99449.0
3 19/01/2021 India 195525.0 17855.0 7951.0 195525.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 81901.0 113613.0 11.0 195525.0
4 20/01/2021 India 251280.0 25472.0 10504.0 251280.0 0.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN 98111.0 153145.0 24.0 251280.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7840 11/08/2021 West Bengal NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7841 12/08/2021 West Bengal NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7842 13/08/2021 West Bengal NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7843 14/08/2021 West Bengal NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7844 15/08/2021 West Bengal NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

7845 rows × 24 columns

In [107]:
dataset2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7845 entries, 0 to 7844
Data columns (total 24 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   Updated On                           7845 non-null   object 
 1   State                                7845 non-null   object 
 2   Total Doses Administered             7621 non-null   float64
 3   Sessions                             7621 non-null   float64
 4    Sites                               7621 non-null   float64
 5   First Dose Administered              7621 non-null   float64
 6   Second Dose Administered             7621 non-null   float64
 7   Male (Doses Administered)            7461 non-null   float64
 8   Female (Doses Administered)          7461 non-null   float64
 9   Transgender (Doses Administered)     7461 non-null   float64
 10   Covaxin (Doses Administered)        7621 non-null   float64
 11  CoviShield (Doses Administered)      7621 non-null   float64
 12  Sputnik V (Doses Administered)       2995 non-null   float64
 13  AEFI                                 5438 non-null   float64
 14  18-44 Years (Doses Administered)     1702 non-null   float64
 15  45-60 Years (Doses Administered)     1702 non-null   float64
 16  60+ Years (Doses Administered)       1702 non-null   float64
 17  18-44 Years(Individuals Vaccinated)  3733 non-null   float64
 18  45-60 Years(Individuals Vaccinated)  3734 non-null   float64
 19  60+ Years(Individuals Vaccinated)    3734 non-null   float64
 20  Male(Individuals Vaccinated)         160 non-null    float64
 21  Female(Individuals Vaccinated)       160 non-null    float64
 22  Transgender(Individuals Vaccinated)  160 non-null    float64
 23  Total Individuals Vaccinated         5919 non-null   float64
dtypes: float64(22), object(2)
memory usage: 1.4+ MB
In [108]:
dataset2.isnull()
Out[108]:
Updated On State Total Doses Administered Sessions Sites First Dose Administered Second Dose Administered Male (Doses Administered) Female (Doses Administered) Transgender (Doses Administered) ... 18-44 Years (Doses Administered) 45-60 Years (Doses Administered) 60+ Years (Doses Administered) 18-44 Years(Individuals Vaccinated) 45-60 Years(Individuals Vaccinated) 60+ Years(Individuals Vaccinated) Male(Individuals Vaccinated) Female(Individuals Vaccinated) Transgender(Individuals Vaccinated) Total Individuals Vaccinated
0 False False False False False False False True True True ... True True True True True True False False False False
1 False False False False False False False True True True ... True True True True True True False False False False
2 False False False False False False False True True True ... True True True True True True False False False False
3 False False False False False False False True True True ... True True True True True True False False False False
4 False False False False False False False True True True ... True True True True True True False False False False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7840 False False True True True True True True True True ... True True True True True True True True True True
7841 False False True True True True True True True True ... True True True True True True True True True True
7842 False False True True True True True True True True ... True True True True True True True True True True
7843 False False True True True True True True True True ... True True True True True True True True True True
7844 False False True True True True True True True True ... True True True True True True True True True True

7845 rows × 24 columns

In [109]:
dataset2.isnull().sum()
Out[109]:
Updated On                                0
State                                     0
Total Doses Administered                224
Sessions                                224
 Sites                                  224
First Dose Administered                 224
Second Dose Administered                224
Male (Doses Administered)               384
Female (Doses Administered)             384
Transgender (Doses Administered)        384
 Covaxin (Doses Administered)           224
CoviShield (Doses Administered)         224
Sputnik V (Doses Administered)         4850
AEFI                                   2407
18-44 Years (Doses Administered)       6143
45-60 Years (Doses Administered)       6143
60+ Years (Doses Administered)         6143
18-44 Years(Individuals Vaccinated)    4112
45-60 Years(Individuals Vaccinated)    4111
60+ Years(Individuals Vaccinated)      4111
Male(Individuals Vaccinated)           7685
Female(Individuals Vaccinated)         7685
Transgender(Individuals Vaccinated)    7685
Total Individuals Vaccinated           1926
dtype: int64
In [110]:
#REMOVING COLUMNS WHICH ARE UNUSEFUL
dataset2=dataset2[dataset2.State!='India']
In [111]:
dataset2=dataset2[dataset2['Total Individuals Vaccinated'].notna()]
In [112]:
dataset2=dataset2.drop(labels=["Transgender(Individuals Vaccinated)", "Female(Individuals Vaccinated)", "Male(Individuals Vaccinated)", "60+ Years (Doses Administered)", "45-60 Years (Doses Administered)","18-44 Years (Doses Administered)"],axis=1)
In [113]:
male_vaccinated = dataset2["Male (Doses Administered)"].sum() 
female_vaccinated = dataset2["Female (Doses Administered)"].sum()
male_vaccinated  
Out[113]:
7135565446.0
In [114]:
female_vaccinated
Out[114]:
6318823830.0
In [115]:
fig = px.pie(values=[male_vaccinated ,female_vaccinated], names=["Male Vaccinated","Female Vaccinated"], width=800,height=500)
fig.update_layout(
    title="Gender wise vaccination status",
    legend_title="Gender",
    font=dict(size=14)
)
fig.layout.template = 'presentation'
fig.show()

From the above pie chart represents that 53% of male are vaccinated and 47% female have got vaccinated.

In [116]:
statewise_vaccination=dataset2.groupby("State")[["First Dose Administered","Second Dose Administered"]].sum().reset_index()
In [117]:
statewise_vaccination
Out[117]:
State First Dose Administered Second Dose Administered
0 Andaman and Nicobar Islands 8.083888e+06 1141995.0
1 Andhra Pradesh 5.629879e+08 160345737.0
2 Arunachal Pradesh 2.099771e+07 5752060.0
3 Assam 2.392148e+08 57541214.0
4 Bihar 6.589511e+08 126284969.0
5 Chandigarh 1.969515e+07 4951484.0
6 Chhattisgarh 4.340759e+08 80327528.0
7 Dadra and Nagar Haveli and Daman and Diu 1.133797e+07 1776446.0
8 Delhi 3.049722e+08 84115315.0
9 Goa 3.204142e+07 6800934.0
10 Gujarat 1.074926e+09 280843871.0
11 Haryana 3.630617e+08 65489121.0
12 Himachal Pradesh 1.500760e+08 29079195.0
13 Jammu and Kashmir 2.034292e+08 39287506.0
14 Jharkhand 2.882814e+08 54330129.0
15 Karnataka 8.663366e+08 182179479.0
16 Kerala 6.189776e+08 144617802.0
17 Ladakh 9.447258e+06 2611222.0
18 Lakshadweep 2.120319e+06 482625.0
19 Madhya Pradesh 7.697363e+08 130607873.0
20 Maharashtra 1.400431e+09 301105538.0
21 Manipur 2.659080e+07 5747319.0
22 Meghalaya 2.713678e+07 5842387.0
23 Mizoram 2.050252e+07 4038784.0
24 Nagaland 1.756547e+07 4124561.0
25 Odisha 5.087671e+08 107810476.0
26 Puducherry 1.773671e+07 3270153.0
27 Punjab 2.871185e+08 49267328.0
28 Rajasthan 1.138229e+09 227002050.0
29 Sikkim 1.608638e+07 4112968.0
30 Tamil Nadu 5.429936e+08 131822416.0
31 Telangana 3.919721e+08 81567248.0
32 Tripura 9.348524e+07 33297280.0
33 Uttar Pradesh 1.196438e+09 259005880.0
34 Uttarakhand 1.741822e+08 46557931.0
35 West Bengal 9.226559e+08 256717715.0
In [118]:
vaccination=dataset2.pivot_table( index = 'State', values = ['First Dose Administered','Second Dose Administered'], aggfunc = 'sum' ).reset_index()
vaccination.style.background_gradient(cmap='twilight')
Out[118]:
  State First Dose Administered Second Dose Administered
0 Andaman and Nicobar Islands 8083888.000000 1141995.000000
1 Andhra Pradesh 562987902.000000 160345737.000000
2 Arunachal Pradesh 20997713.000000 5752060.000000
3 Assam 239214775.000000 57541214.000000
4 Bihar 658951108.000000 126284969.000000
5 Chandigarh 19695148.000000 4951484.000000
6 Chhattisgarh 434075946.000000 80327528.000000
7 Dadra and Nagar Haveli and Daman and Diu 11337973.000000 1776446.000000
8 Delhi 304972186.000000 84115315.000000
9 Goa 32041420.000000 6800934.000000
10 Gujarat 1074926034.000000 280843871.000000
11 Haryana 363061708.000000 65489121.000000
12 Himachal Pradesh 150075973.000000 29079195.000000
13 Jammu and Kashmir 203429155.000000 39287506.000000
14 Jharkhand 288281448.000000 54330129.000000
15 Karnataka 866336587.000000 182179479.000000
16 Kerala 618977564.000000 144617802.000000
17 Ladakh 9447258.000000 2611222.000000
18 Lakshadweep 2120319.000000 482625.000000
19 Madhya Pradesh 769736314.000000 130607873.000000
20 Maharashtra 1400430993.000000 301105538.000000
21 Manipur 26590795.000000 5747319.000000
22 Meghalaya 27136779.000000 5842387.000000
23 Mizoram 20502516.000000 4038784.000000
24 Nagaland 17565474.000000 4124561.000000
25 Odisha 508767148.000000 107810476.000000
26 Puducherry 17736714.000000 3270153.000000
27 Punjab 287118510.000000 49267328.000000
28 Rajasthan 1138229441.000000 227002050.000000
29 Sikkim 16086375.000000 4112968.000000
30 Tamil Nadu 542993553.000000 131822416.000000
31 Telangana 391972116.000000 81567248.000000
32 Tripura 93485242.000000 33297280.000000
33 Uttar Pradesh 1196437796.000000 259005880.000000
34 Uttarakhand 174182247.000000 46557931.000000
35 West Bengal 922655934.000000 256717715.000000
In [119]:
vaccination_gender=dataset2.pivot_table( index = 'State', values = ['Male (Doses Administered)','Female (Doses Administered)'], aggfunc = 'sum' ).reset_index()
vaccination_gender.style.background_gradient(cmap='RdBu_r')
Out[119]:
  State Female (Doses Administered) Male (Doses Administered)
0 Andaman and Nicobar Islands 3713987.000000 4387523.000000
1 Andhra Pradesh 282110452.000000 282404176.000000
2 Arunachal Pradesh 9320135.000000 11753535.000000
3 Assam 109322106.000000 130411866.000000
4 Bihar 311444792.000000 349289470.000000
5 Chandigarh 8381208.000000 11348264.000000
6 Chhattisgarh 223617997.000000 211645756.000000
7 Dadra and Nagar Haveli and Daman and Diu 4309194.000000 7048017.000000
8 Delhi 125854689.000000 179823973.000000
9 Goa 15687251.000000 16425164.000000
10 Gujarat 500611622.000000 577514466.000000
11 Haryana 168890962.000000 194807605.000000
12 Himachal Pradesh 76280181.000000 74186948.000000
13 Jammu and Kashmir 81029066.000000 122692650.000000
14 Jharkhand 136860180.000000 152254829.000000
15 Karnataka 440647051.000000 427750736.000000
16 Kerala 335233013.000000 285509478.000000
17 Ladakh 4308849.000000 5156509.000000
18 Lakshadweep 909442.000000 1215081.000000
19 Madhya Pradesh 351430006.000000 420324751.000000
20 Maharashtra 648834014.000000 754051696.000000
21 Manipur 11250938.000000 15398920.000000
22 Meghalaya 13043905.000000 14158601.000000
23 Mizoram 9976825.000000 10594236.000000
24 Nagaland 7032326.000000 10590660.000000
25 Odisha 242718934.000000 267723377.000000
26 Puducherry 8645162.000000 9113173.000000
27 Punjab 123351778.000000 164160913.000000
28 Rajasthan 547146906.000000 593849036.000000
29 Sikkim 7440996.000000 8693029.000000
30 Tamil Nadu 256064789.000000 287615366.000000
31 Telangana 189034200.000000 204262462.000000
32 Tripura 44935091.000000 48845211.000000
33 Uttar Pradesh 518774798.000000 681560268.000000
34 Uttarakhand 84788817.000000 89916330.000000
35 West Bengal 415822168.000000 509081371.000000

Visualization by Graph¶

In [120]:
vaccination=dataset2.pivot_table( index = 'State', values = ['Total Individuals Vaccinated'], aggfunc = 'sum' ).sort_values(by = ['Total Individuals Vaccinated'],ascending=False).reset_index()
In [121]:
fig = px.bar( vaccination, x='Total Individuals Vaccinated',y='State', color ='State',width=900, height=550) 
fig.update_layout(
    title="States with number of vaccinated individuals",
    xaxis_title="State",
    yaxis_title="Doses",
    legend_title="State",
    font=dict(
        size=14
    )
)
fig.layout.template = 'presentation'
fig.show()

This bar chart represents the highest number of people got vaccinated in India according to the highest to lowest number of vaccinated people. It shows that the highest number of vaccination was done in Maharashtra and futhur followed by Uttar Pradesh,Rajasthan,Gujarat and more

In [122]:
# Top 15 states with highest number of vaccination 
vaccination1=dataset2.pivot_table( index = 'State', values = ['Total Individuals Vaccinated'], aggfunc = 'sum' ).sort_values(by = ['Total Individuals Vaccinated'],ascending=False).reset_index().head(15)
In [123]:
fig = px.bar( vaccination1, x='State',y='Total Individuals Vaccinated',width=900, height=550) 
fig.update_layout(
    title="States with number of vaccinated individuals",
    xaxis_title="State",
    yaxis_title="Doses",
    legend_title="State",
    font=dict(
        size=14
    )
)
fig.layout.template = 'presentation'
fig.show()
In [124]:
# Top 15 states with highest number of vaccination gender wise
vaccination2=dataset2.pivot_table( index = 'State', values = ['Male (Doses Administered)'], aggfunc = 'sum' ).reset_index()
fig = px.bar( vaccination2, x='State',y='Male (Doses Administered)',width=900, height=550) 
fig.update_layout(
    title="States with number of male vaccinated",
    xaxis_title="State",
    yaxis_title="Doses",
    legend_title="State",
    font=dict(
        size=14
    )
)
fig.layout.template = 'presentation'
fig.show()

It shows the number male vaccinated in differnt states.

In [125]:
vaccination3=dataset2.pivot_table( index = 'State', values = ['Female (Doses Administered)'], aggfunc = 'sum' ).reset_index()
fig = px.bar( vaccination3, x='State',y='Female (Doses Administered)',width=900, height=550) 
fig.update_layout(
    title="States with number of frmale vaccinated",
    xaxis_title="State",
    yaxis_title="Doses",
    legend_title="State",
    font=dict(
        size=14
    )
)
fig.layout.template = 'presentation'
fig.show()

It shows the number female vaccinated in differnt states.

In [126]:
vaccination4=dataset2.pivot_table( index = 'State', values = ['Male (Doses Administered)'], aggfunc = 'sum' ).sort_values(by = ['Male (Doses Administered)'],ascending=False).reset_index().head(15)
fig = px.bar( vaccination4, x='State',y='Male (Doses Administered)',width=900, height=550) 
fig.update_layout(
    title="Top 15 States with number of Male vaccinated",
    xaxis_title="State",
    yaxis_title="Doses",
    legend_title="State",
    font=dict(
        size=14
    )
)
fig.layout.template = 'presentation'
fig.show()
In [127]:
vaccination5=dataset2.pivot_table( index = 'State', values = ['Female (Doses Administered)'], aggfunc = 'sum' ).sort_values(by = ['Female (Doses Administered)'],ascending=False).reset_index().head(15)
fig = px.bar( vaccination5, x='State',y='Female (Doses Administered)',width=900, height=550) 
fig.update_layout(
    title="Top 15 States with number of female vaccinated",
    xaxis_title="State",
    yaxis_title="Doses",
    legend_title="State",
    font=dict(
        size=14
    )
)
fig.layout.template = 'presentation'
fig.show()

VIsualizing by types of Vaccine¶

In [128]:
vaccine6 = dataset2[" Covaxin (Doses Administered)"].sum() 
vaccine6
Out[128]:
1716565459.0
In [129]:
vaccine7 = dataset2["CoviShield (Doses Administered)"].sum()
vaccine7
Out[129]:
14640233875.0
In [130]:
fig = px.pie(values=[vaccine6,vaccine7], names=["Covaxin","Covidshield"],width=800,height=500)
fig.update_layout(
    title="Vaccine (Doses Administered)",
    legend_title="Vaccine Name",
    font=dict(
        size=14
    )
)
fig.show()

From the above pie chart it represents that the majority of people have take CoviShueld vaccine i.e. 89.5% and only 10.5% people have taken Covaxin.

Visualization according to age wise¶

In [131]:
agewise=dataset2.pivot_table( index = 'State', values = ['18-44 Years(Individuals Vaccinated)','45-60 Years(Individuals Vaccinated)','60+ Years(Individuals Vaccinated)'], aggfunc = 'sum' ).sort_values(by = ['18-44 Years(Individuals Vaccinated)','45-60 Years(Individuals Vaccinated)','60+ Years(Individuals Vaccinated)'],ascending=False).reset_index()
In [132]:
agewise
Out[132]:
State 18-44 Years(Individuals Vaccinated) 45-60 Years(Individuals Vaccinated) 60+ Years(Individuals Vaccinated)
0 Uttar Pradesh 244892552.0 488909436.0 414090525.0
1 Maharashtra 241658734.0 584319250.0 530095028.0
2 Gujarat 231453106.0 426820821.0 376690790.0
3 Rajasthan 181950995.0 429163746.0 484094715.0
4 Madhya Pradesh 177578823.0 296933460.0 267982290.0
5 West Bengal 163820159.0 373380603.0 348339928.0
6 Karnataka 162778610.0 353393295.0 325517613.0
7 Tamil Nadu 147393019.0 217706722.0 161943358.0
8 Bihar 145118819.0 225690190.0 264843188.0
9 Andhra Pradesh 101023557.0 254203320.0 187877645.0
10 Delhi 90950668.0 119317570.0 82315060.0
11 Haryana 85237095.0 128190677.0 139849643.0
12 Kerala 82660559.0 220170317.0 294584407.0
13 Telangana 77762244.0 179203007.0 123689812.0
14 Odisha 74107195.0 204669841.0 210911486.0
15 Jharkhand 63320139.0 109207209.0 105121398.0
16 Assam 61397631.0 107917402.0 61222783.0
17 Punjab 59333158.0 121378482.0 100232366.0
18 Chhattisgarh 41727384.0 229332983.0 148175206.0
19 Jammu and Kashmir 34068964.0 95141956.0 66307979.0
20 Uttarakhand 33983867.0 68695577.0 65624574.0
21 Himachal Pradesh 17412755.0 68793840.0 59984689.0
22 Tripura 13500603.0 48965819.0 27726999.0
23 Manipur 9858542.0 10185405.0 4939201.0
24 Meghalaya 9196351.0 11413151.0 5268632.0
25 Goa 7514873.0 12191774.0 11403306.0
26 Arunachal Pradesh 7460518.0 8984788.0 3566261.0
27 Nagaland 6399551.0 6638742.0 3655521.0
28 Puducherry 5471417.0 6712395.0 5125846.0
29 Chandigarh 5287721.0 7913481.0 5823412.0
30 Dadra and Nagar Haveli and Daman and Diu 5054593.0 4062240.0 1719728.0
31 Mizoram 4826364.0 8863596.0 5915492.0
32 Sikkim 3557099.0 7119614.0 4728829.0
33 Ladakh 3240510.0 3040285.0 2759736.0
34 Andaman and Nicobar Islands 1223324.0 4376537.0 2243271.0
35 Lakshadweep 591777.0 925093.0 528243.0
In [133]:
fig=px.scatter(agewise,x='State',y=['18-44 Years(Individuals Vaccinated)','45-60 Years(Individuals Vaccinated)','60+ Years(Individuals Vaccinated)'])
fig.update_layout(
    title="Number of doses given to various age groups",
    xaxis_title="States",
    yaxis_title="Doses",
    font=dict(
        size=14
    )
)
fig.show()

From the above scatter plot we can clearly obsrve that the age between 18-44 years received maximum vaccination and later on follwed by 45-60 and 60+.

In [134]:
totalvaccine=dataset2.pivot_table( index = 'Updated On', values = 'Total Individuals Vaccinated', aggfunc = 'sum' )
fig=px.area(totalvaccine,x=totalvaccine.index,y='Total Individuals Vaccinated')
fig.update_layout(
    title="Total no. of individual vaccinated",
    xaxis_title="Time Period",
    yaxis_title="Doses",
    font=dict(
        size=14
    )
)
fig.layout.template = 'presentation'
fig.show()
In [135]:
dataset2.plot(kind = 'scatter',x= 'Female (Doses Administered)', y='Male (Doses Administered)', alpha= 0.45,
        s=dataset2['Total Doses Administered']/1000000,c= 'Total Doses Administered', cmap = 'jet',
        label='Population',title ='Graphical Geographical Data',figsize= (15,10));

Regression¶

In [136]:
dataset2=dataset2.drop('Updated On',axis=1)
In [137]:
dataset2=dataset2.drop('State',axis=1);
In [138]:
percent_missing = dataset2.isnull().sum() * 100 / len(dataset2)
missing_value_df1 = pd.DataFrame({'column_name': dataset2.columns,
                                 'percent_missing': percent_missing})
missing_value_df1.sort_values('percent_missing', inplace=True)
missing_value_df1
Out[138]:
column_name percent_missing
Total Doses Administered Total Doses Administered 0.000000
Sessions Sessions 0.000000
Sites Sites 0.000000
First Dose Administered First Dose Administered 0.000000
Second Dose Administered Second Dose Administered 0.000000
Male (Doses Administered) Male (Doses Administered) 0.000000
Female (Doses Administered) Female (Doses Administered) 0.000000
Transgender (Doses Administered) Transgender (Doses Administered) 0.000000
Covaxin (Doses Administered) Covaxin (Doses Administered) 0.000000
CoviShield (Doses Administered) CoviShield (Doses Administered) 0.000000
Total Individuals Vaccinated Total Individuals Vaccinated 0.000000
AEFI AEFI 36.881403
45-60 Years(Individuals Vaccinated) 45-60 Years(Individuals Vaccinated) 36.916131
60+ Years(Individuals Vaccinated) 60+ Years(Individuals Vaccinated) 36.916131
18-44 Years(Individuals Vaccinated) 18-44 Years(Individuals Vaccinated) 36.933495
Sputnik V (Doses Administered) Sputnik V (Doses Administered) 78.155930
In [139]:
#dropping unuseful columns for regression
dataset2 = dataset2.drop(labels=['AEFI','45-60 Years(Individuals Vaccinated)','60+ Years(Individuals Vaccinated)','18-44 Years(Individuals Vaccinated)','Sputnik V (Doses Administered)'],axis=1)
In [140]:
dataset2.head()
Out[140]:
Total Doses Administered Sessions Sites First Dose Administered Second Dose Administered Male (Doses Administered) Female (Doses Administered) Transgender (Doses Administered) Covaxin (Doses Administered) CoviShield (Doses Administered) Total Individuals Vaccinated
212 23.0 2.0 2.0 23.0 0.0 12.0 11.0 0.0 0.0 23.0 23.0
213 23.0 2.0 2.0 23.0 0.0 12.0 11.0 0.0 0.0 23.0 23.0
214 42.0 9.0 2.0 42.0 0.0 29.0 13.0 0.0 0.0 42.0 42.0
215 89.0 12.0 2.0 89.0 0.0 53.0 36.0 0.0 0.0 89.0 89.0
216 124.0 16.0 3.0 124.0 0.0 67.0 57.0 0.0 0.0 124.0 124.0
In [141]:
x_train=dataset2.drop('Total Doses Administered',axis=1)
In [142]:
y_train=dataset2['Total Doses Administered']
In [143]:
x_train.head()
Out[143]:
Sessions Sites First Dose Administered Second Dose Administered Male (Doses Administered) Female (Doses Administered) Transgender (Doses Administered) Covaxin (Doses Administered) CoviShield (Doses Administered) Total Individuals Vaccinated
212 2.0 2.0 23.0 0.0 12.0 11.0 0.0 0.0 23.0 23.0
213 2.0 2.0 23.0 0.0 12.0 11.0 0.0 0.0 23.0 23.0
214 9.0 2.0 42.0 0.0 29.0 13.0 0.0 0.0 42.0 42.0
215 12.0 2.0 89.0 0.0 53.0 36.0 0.0 0.0 89.0 89.0
216 16.0 3.0 124.0 0.0 67.0 57.0 0.0 0.0 124.0 124.0
In [144]:
y_train.head()
Out[144]:
212     23.0
213     23.0
214     42.0
215     89.0
216    124.0
Name: Total Doses Administered, dtype: float64

Linear Regression¶

In [145]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
In [146]:
model.fit(x_train,y_train)
Out[146]:
LinearRegression()
In [147]:
model.score(x_train,y_train)
Out[147]:
1.0
In [148]:
lw = 2
plt.figure(figsize=(10, 10))
plt.title("Weights of the model")
plt.plot(dataset2['Total Doses Administered'], color="gold", linewidth=lw, label="data")
plt.plot(model.predict(x_train), color="blue", linestyle="--", label="linear estimate")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc="best", prop=dict(size=12))
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc="upper left")
Out[148]:
<matplotlib.legend.Legend at 0x18ae571b790>

Random Forest Regressor¶

A random forest regressor is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting

In [149]:
from sklearn.ensemble import RandomForestRegressor
In [150]:
model1 = RandomForestRegressor()
In [151]:
model1.fit(x_train,y_train)
Out[151]:
RandomForestRegressor()
In [152]:
model1.score(x_train,y_train)
Out[152]:
0.999985726126071
In [153]:
lw = 2
plt.figure(figsize=(10, 10))
plt.title("Weights of the model")
plt.plot(dataset2['Total Doses Administered'], color="red", linewidth=lw, label="data")
plt.plot(model1.predict(x_train), color="blue", linestyle="--", label="linear estimate")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc="best", prop=dict(size=12))
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc="upper left")
Out[153]:
<matplotlib.legend.Legend at 0x18ae52fae00>

Bayesian Ridge¶

Bayesian regression allows a natural mechanism to survive insufficient data or poorly distributed data by formulating linear regression using probability distributors rather than point estimates. The output or response ‘y’ is assumed to drawn from a probability distribution rather than estimated as a single value.

In [154]:
from sklearn.linear_model import BayesianRidge
In [155]:
model2 = BayesianRidge(compute_score=True)
In [156]:
model2.fit(x_train,y_train)
Out[156]:
BayesianRidge(compute_score=True)
In [157]:
model2.score(x_train,y_train)
Out[157]:
0.9999999999999992
In [158]:
lw = 2
plt.figure(figsize=(10, 10))
plt.title("Weights of the model")
plt.plot(dataset2['Total Doses Administered'], color="red", linewidth=lw, label="data")
plt.plot(model2.predict(x_train), color="green", linestyle="--", label="linear estimate")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc="best", prop=dict(size=12))
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc="upper left")
Out[158]:
<matplotlib.legend.Legend at 0x18ae536b040>
In [159]:
lw = 2
plt.figure(figsize=(10, 10))
plt.title("Weights of the model")
plt.plot(dataset2['Total Doses Administered'], color="red", linewidth=lw, label="data")
plt.plot(model2.predict(x_train), color="green", linestyle="--", label="bayesian estimate")
plt.plot(model1.predict(x_train), color="blue", linestyle="--", label="random estimate")
plt.plot(model.predict(x_train), color="gold", linestyle="--", label="linear estimate")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc="best", prop=dict(size=12))
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc="upper left")
Out[159]:
<matplotlib.legend.Legend at 0x18ae05efac0>
In [160]:
plt.figure(figsize=(6, 5))
plt.title("Marginal log-likelihood")
plt.plot(model2.scores_, color="navy", linewidth=lw)
plt.ylabel("Score")
plt.xlabel("Iterations")
Out[160]:
Text(0.5, 0, 'Iterations')

Conclusion¶

The coronavirus disease continues to spread across the world following a trajectory that is difficult to predict. The health, humanitarian and socio-economic policies adopted by countries will determine the speed and strength of the recovery. From the above analysis it is clearly seen that the the covid has hit India in a very disasterous manner and many people died in this and many people got recovered.It is observed that the states having more international contact have suffered alot than any other and in this category Maharastra,Karnataka,Tamil Nadu,Kerala,Gujarat,Delhi etc. Inthis around 5 billion were cured,73 million were dead between the time of our dataset andWith this our vaccination analysis says that our government has done vaccination in a very efficient manner which lead to the vaccination of almost all people i.e. about 14 billion people got vaccination upto September in which 8 billion were male and 6 billion were female.